An Exact Algorithm to Compute the DCJ Distance for Genomes with Duplicate Genes

نویسندگان

  • Mingfu Shao
  • Yu Lin
  • Bernard M. E. Moret
چکیده

Computing the edit distance between two genomes is a basic problem in the study of genome evolution. The double-cut-and-join (DCJ) model has formed the basis for most algorithmic research on rearrangements over the last few years. The edit distance under the DCJ model can be computed in linear time for genomes without duplicate genes, while the problem becomes NP-hard in the presence of duplicate genes. In this paper, we propose an ILP (integer linear programming) formulation to compute the DCJ distance between two genomes with duplicate genes. We also provide an efficient preprocessing approach to simplify the ILP formulation while preserving optimality. Comparison on simulated genomes demonstrates that our method outperforms MSOAR in computing the edit distance, especially when the genomes contain long duplicated segments. We also apply our method to assign orthologous gene pairs among human, mouse and rat genomes, where once again our method outperforms MSOAR.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Models and Algorithms for Comparative Genomics

The deluge of sequenced whole-genome data has motivated the study of comparative genomics, which provides global views on genome evolution, and also offers practical solutions in deciphering the functional roles of components of genomes. A fundamental computational problem in whole-genome comparison is to infer the most likely large-scale events (rearrangements and content-modifying events) of ...

متن کامل

Genomic Distance with DCJ and Indels

The double cut and join (DCJ) operation, introduced by Yancopoulos, Attie and Friedberg in 2005, allows one to represent most rearrangement events in genomes. However, a DCJ cannot perform an insertion or a deletion and most approaches under this model consider only genomes with the same content and without duplications, including the linear time algorithms to compute the DCJ distance and to fi...

متن کامل

Exemplar or Matching: Modeling DCJ Problems with Unequal Content Genome Data

The edit distance under the DCJ model can be computed in linear time for genomes with equal content or with Indels. But it becomes NP-Hard in the presence of duplications, a problem largely unsolved especially when Indels are considered. In this paper, we compare two mainstream methods to deal with duplications and associate them with Indels: one by deletion, namely DCJ-Indel-Exemplar distance;...

متن کامل

DCJ Median Problems on Linear Multichromosomal Genomes: Graph Representation and Fast Exact Solutions

In genome rearrangement, given a set of genomes G and a distance measure d, the median problem asks for another genome q that minimizes the total distance [Formula: see text]. This is a key problem in genome rearrangement based phylogenetic analysis. Although this problem is known to be NP-hard, we have shown in a previous article, on circular genomes and under the DCJ distance measure, that a ...

متن کامل

Algorithms for sorting unsigned linear genomes by the DCJ operations

MOTIVATION The double cut and join operation (abbreviated as DCJ) has been extensively used for genomic rearrangement. Although the DCJ distance between signed genomes with both linear and circular (uni- and multi-) chromosomes is well studied, the only known result for the NP-complete unsigned DCJ distance problem is an approximation algorithm for unsigned linear unichromosomal genomes. In thi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014